Tag
1 article
NVIDIA's Gated DeltaNet-2 decouples erase and write operations in linear attention, outperforming models like Mamba-2 and KDA in long-context tasks.